Using deep learning to detect food intake behavior from video

Rouast, Philipp V.

Title: Using deep learning to detect food intake behavior from video
Creator: Rouast, Philipp V.
Relation: University of Newcastle Research Higher Degree Thesis
Resource Type: thesis
Date: 2020
Description: Research Doctorate - Doctor of Philosophy (PhD)
Description: The rising prevalence of non-communicable diseases calls for more sophisticated approaches to support individuals in engaging in healthy lifestyle behaviors, particularly in terms of their dietary intake. Accurate information on dietary intake forms the basis for assessing a person’s diet and delivering dietary interventions. Whereas such information is traditionally sourced in an active process through memory recall, recent research has investigated how dietary monitoring can be turned into a passive process by partial automation using sensor data and machine learning. Accurate detection of individual intake gestures in particular is a key step towards automatic dietary monitoring. The main goal of this research is to design a system for automatic detection of intake gestures from video, which could be leveraged to improve the active dietary monitoring process. To lay the groundwork for our research, we1 reviewed the literature on automatic dietary monitoring and propose a research framework for user assistance systems that combine active and passive methods of dietary monitoring. Facing a lack of existing literature for automatic dietary monitoring from video, we additionally reviewed the literature on affect recognition to identify the state of the art in deep learning for human-computer interaction. To facilitate this research, we collected and developed the Objectively Recognizing Eating Behavior and Associated Intake (OREBA) dataset. OREBA is the first public multimodal dataset of eating occasions with annotated intake gestures that includes both video and inertial sensor data. A total of 9069 intake gestures are included, from 180 unique participants across 202 eating sessions and two separate scenarios. In our first study on the OREBA dataset, we demonstrate the feasibility of detecting intake gestures from video. To this end, we combined a two-stage approach with state-of-the-art deep learning architectures from video action recognition. In our second study on the OREBA dataset, we improved upon our first study in several ways by proposing a novel single-stage approach. The proposed approach (i) uses simplified labels, (ii) results in an improved performance, and (iii) works for both intake gesture detection as well as eating and drinking gesture detection. We show that these benefits hold for both video and inertial sensor data, and for both the OREBA dataset and the established Clemson dataset.
Subject: deep learning; automatic dietary monitoring; video sensors; inertial sensors; affective computing; human-computer interaction; intake gesture detection; thesis by publication
Identifier: http://hdl.handle.net/1959.13/1429291
Identifier: uon:38695
Language: eng
Full Text

Hits: 1321
Visitors: 1519
Downloads: 371

		Thumbnail	File	Description	Size	Format
View Details Download			ATTACHMENT01	Thesis	11 MB	Adobe Acrobat PDF	View Details Download
View Details Download			ATTACHMENT02	Abstract	198 KB	Adobe Acrobat PDF	View Details Download